Network models of massive datasets
نویسندگان
چکیده
منابع مشابه
Network models of massive datasets
We give a brief overview of the methodology of modeling massive datasets arising in various applications as networks. This approach is often useful for extracting non-trivial information from the datasets by applying standard graph-theoretic techniques. We also point out that graphs representing datasets coming from diverse practical fields have a similar power-law structure, which indicates th...
متن کاملA Pluggable Architecture for Building User Models From Massive Datasets
In many situations, it is common that a large single source of data serves as input to multiple application areas, each of which may use a different user model. It is often the case that each user model is assembled using a different process, however, in general, it is more efficient to have a single architecture for building different user models for different application areas. We propose an ...
متن کاملMassive Datasets in Astronomy
Astronomy has a long history of acquiring, systematizing, and interpreting large quantities of data. Starting from the earliest sky atlases through the first major photographic sky surveys of the 20th century, this tradition is continuing today, and at an ever increasing rate. Like many other fields, astronomy has become a very data-rich science, driven by the advances in telescope, detector, a...
متن کاملTyping Massive JSON Datasets
Cloud-specific languages are usually untyped, and no guarantees about the correctness of complex jobs can be statically obtained. Datasets too are usually untyped and no schema information is needed for their manipulation. In this paper we sketch a typing algorithm for JSON datasets. Our approach can be used to infer a succinct type from scratch for a collection of JSON objects, as well as to v...
متن کاملClustering Massive Datasets
Clustering data is not an easy problem in general, and is compounded for a massive dataset. Restricting attention to a sample from the data ignores minority groups and hence compromises on the available riches. This paper develops, under Gaussian assumptions, a multi-stage sequential algorithm. After clustering an initial sample, observations that can be reasonably classiied in the identiied gr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Science and Information Systems
سال: 2004
ISSN: 1820-0214,2406-1018
DOI: 10.2298/csis0401075b